This short introduction contains a quick tutorial suggested for users who have basic knowledge of R.
The code assumes the user knows how to run R code in the
console/terminal, and has an understanding of data.frame
objects and common R and tidyverse syntax.
However, anyone is welcome (and highly encouraged!) to follow and play around with the tutorial in this introduction - especially those who are curious or interested to learn about spatial data processing in R!
For more details, see Software and Package Versions.
Run drop down (top right of the
code pane) and click Run Allknit (top left of code
pane) and a file will be generated in docs/index.htmlInstall R packages if needed:
tidyverse: general data processing package that works
with tabular structured datalubridate: convenient functions for handling date and
time datasf: spatially enables the tidyverse
package to extend functionality for spatial dataggplot2: package for creating plots, including maps of
sf objectstmap: useful for quick static and interactive maps of
sf objectsggspatial: makes plotting general map elements like
base maps, north arrows and scale bars easierprettymapr: required dependency for
ggspatial to create base mapsinstall.packages("tidyverse")
install.packages("lubridate")
install.packages("sf")
install.packages("tmap")
install.packages("ggspatial")
install.packages("ggplot2")
install.packages("prettymapr")Load libraries.
For this tutorial, we will be using the following data:
data/toronto-ksi-geojsondata/toronto-nbhoods.geojsondata/toronto-ase.csv (for
demonstration purposes, we use a non-spatial file with coordinates to
start)All data is available in the data folder.
To read spatial data use the read_sf function.
Note: For shapefiles, simply use extension
.shp.
Once you have read the spatial data into sf objects,
each sf object should ideally have a defined CRS that
affects the accuracy of coordinates, units of measurement, and spatially
related calculations.
For the KSI data, we can view the CRS with st_crs, which is usually WGS84 (World Geodetic System 1984, a global standard that is often used in most spatial data) with public registry reference EPSG:4326:
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
We also see the same CRS is used for the neighbourhoods data:
## Coordinate Reference System:
## User input: WGS 84
## wkt:
## GEOGCRS["WGS 84",
## DATUM["World Geodetic System 1984",
## ELLIPSOID["WGS 84",6378137,298.257223563,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## ID["EPSG",4326]]
In most situations, you do not need to modify or define the CRS as
the sf package handles accurate calculations and most CRS
are distributed in WGS84, but we should always double check.
To define the CRS if it is not given in the spatial data, we use st_set_crs
with the correct EPSG number (4326 in our case), which can
be found on espg.io.
As an example, we will use a separate example such as the
nc.shp file from the sf package, but set the
CRS to NA as a demonstration:
# Read example data
exdata <- read_sf(
system.file("shape/nc.shp", package="sf"),
crs = NA
)
# View CRS
st_crs(exdata)## Coordinate Reference System: NA
Now can set the CRS to the correct one, which is NAD27 with EPSG:4267:
# Set CRS to NAD27
exdata <- exdata %>%
st_set_crs(4267)
# View CRS again after setting it to NAD27
st_crs(exdata)## Coordinate Reference System:
## User input: EPSG:4267
## wkt:
## GEOGCRS["NAD27",
## DATUM["North American Datum 1927",
## ELLIPSOID["Clarke 1866",6378206.4,294.978698213898,
## LENGTHUNIT["metre",1]]],
## PRIMEM["Greenwich",0,
## ANGLEUNIT["degree",0.0174532925199433]],
## CS[ellipsoidal,2],
## AXIS["geodetic latitude (Lat)",north,
## ORDER[1],
## ANGLEUNIT["degree",0.0174532925199433]],
## AXIS["geodetic longitude (Lon)",east,
## ORDER[2],
## ANGLEUNIT["degree",0.0174532925199433]],
## USAGE[
## SCOPE["Geodesy."],
## AREA["North and central America: Antigua and Barbuda - onshore. Bahamas - onshore plus offshore over internal continental shelf only. Belize - onshore. British Virgin Islands - onshore. Canada onshore - Alberta, British Columbia, Manitoba, New Brunswick, Newfoundland and Labrador, Northwest Territories, Nova Scotia, Nunavut, Ontario, Prince Edward Island, Quebec, Saskatchewan and Yukon - plus offshore east coast. Cuba - onshore and offshore. El Salvador - onshore. Guatemala - onshore. Honduras - onshore. Panama - onshore. Puerto Rico - onshore. Mexico - onshore plus offshore east coast. Nicaragua - onshore. United States (USA) onshore and offshore - Alabama, Alaska, Arizona, Arkansas, California, Colorado, Connecticut, Delaware, Florida, Georgia, Idaho, Illinois, Indiana, Iowa, Kansas, Kentucky, Louisiana, Maine, Maryland, Massachusetts, Michigan, Minnesota, Mississippi, Missouri, Montana, Nebraska, Nevada, New Hampshire, New Jersey, New Mexico, New York, North Carolina, North Dakota, Ohio, Oklahoma, Oregon, Pennsylvania, Rhode Island, South Carolina, South Dakota, Tennessee, Texas, Utah, Vermont, Virginia, Washington, West Virginia, Wisconsin and Wyoming - plus offshore . US Virgin Islands - onshore."],
## BBOX[7.15,167.65,83.17,-47.74]],
## ID["EPSG",4267]]
Now that there is a defined CRS, we can reproject them with st_transform to a different CRS, such as WGS84 (EPSG:4326), in case we do not have consistent CRS across different spatial data:
It is important to ensure that all your spatial data is in the same CRS before mapping or conducting any spatially related calculations or processes.
Each sf object is made up of ordered coordinates that
define the geometry of the objects.
Commonly, there are three general geometry types:
POINT: pairs of xy coordinates making up points in
spaceLINESTRING: multiple point coordinates (in order) that
make up lines in spacePOLYGON: multiple point coordinates (in order) that
enclose areas in spaceMULTIPOINT, MULTILINESTRING, MULTIPOLYGON: same as
above, but with more than one sf object per rowNote: There are a few other rarer geometry types seen here.
Points are simply pairs of coordinates:
# Set coordinates for Toronto
coordinates <- c(-79.3832, 43.6532)
# Convert coordinates to points
point <- st_point(coordinates)
# Show point
plot(point)
Lines are an ordered set of points.
To demonstrate let’s create a line from Toronto, through Ottawa, to Montreal (Canada).
First, create a set of ordered coordinates.
coordinates2 <- data.frame(
longitude = c(-79.3832, -75.7003, -73.5674),
latitude = c(43.6532, 45.4201, 45.5019)
)
coordinates2Next, convert these coordinates to a multipoint object:

Finally, convert the multipoint to a line:

Polygons are sets of ordered coordinates with an enclosed area formed by joining the starting and ending coordinate.
To demonstrate, we will create an ordered set of coordinates for Toronto, Kingston, and Ottawa (Canada).
Note: The last coordinate is the same as the first one to form the enclosed area.
coordinates3 <- data.frame(
longitude = c(-79.3832, -75.7003, -76.4930, -79.3832),
latitude = c(43.6532, 44.2334, 45.4201, 43.6532)
)
coordinates3We then want to convert these coordinates to a multipoint object as shown previously:

Lastly, we can enclose these points to form a triangular area:

Sometimes, the data file is not in a spatial format such as
.geojson or .shp, but coordinates, such as
longitude and latitude are given along with the Coordinate Reference
System (CRS).
For these cases, we can first read in the non-spatial data normally
and then convert the coordinates into sf objects after.
As an example, we use the non-spatial ASE data in Comma Separated
Values (CSV) format, which contains longitude and latitude coordinates
that can be converted into a sf object.
Note: According to the Toronto ASE Data Page, the CRS is WGS84 (EPSG:4326).
Next, we convert the columns longitude and
latitude (holding our coordinates), into sf
objects with CRS EPSG:4326:
ase <- ase %>%
st_as_sf(
coords = c("longitude", "latitude"), # cols for coordinates
crs = 4326 # CRS is WGS84
)Then we can inspect our data to see that the coordinates have been
converted successfully with the addition of a geometry
column:
As you have seen in the previous section, non-spatial data works
similarly to standard data.frame structures, except with
the addition of a geometry column usually at the end of the
columns.
Let’s preview the non-spatial KSI data:
and also the Neighbourhoods data:
Most dplyr functions from
tidyverse will work with sf objects, allowing
you to easily manipulate the non-spatial portions of the read spatial
data.
For example, we get the number of KSI collisions each year using the
year function from lubridate and
count from dplyr:
You can preview spatial data using plot, which generates several maps of up to the first 9 columns as the default.
Let’s preview the KSI data:

We can also preview a specific variable:

and again for the Neighbourhoods data:
In addition to previewing spatial data statically, we can utilize the
tmap package to preview spatial data interactively.
Let’s try it with the KSI data (we limit it to the first 1000 records due to the size of the data):
tmap_mode sets the visualization mode to
interactivetm_shape tells the tmap package to use the
ksi datahead filters for the first 1000 rowstm_dots visualizes the data as point datatmap_mode("view") # set to interactive mode
tm_shape(ksi %>% head(1000)) +
tm_dots(
col = "ACCLASS", # color based on column
clustering = TRUE, # group points together for cleaner visuals
popup.vars = TRUE # allow user to click each point to inspect data
)We can also do the same for the Neighbourhoods data but with
tm_polygons from tmap instead:
A common spatial data processing task is to apply buffers (radial extensions of spatial geometries) to spatial objects, which can be done using the st_buffer function.
Let’s try this for a couple of records in the KSI data, which are originally points as seen below:
# Select the first 3 records of KSI
ksi_buff <- ksi %>%
head(3)
# Display the KSI geometry as points
ksi_buff %>%
select(geometry) %>%
plot
Then we can buffer the points in the KSI data by 1000 meters (unit of measure follows the CRS, which is meters for WGS84):
# Buffer the point geoms by 1000m
ksi_buff <- ksi_buff %>%
select(geometry) %>%
st_buffer(dist = 1000)
# Plot polygon geometries after buffer
ksi_buff %>%
select(geometry) %>%
plot
Applying buffers to points is useful for capturing other spatial objects within a certain distance of each point - relating captured spatial objects to each point.
Another commonly used spatial function is spatial joins, which can be done using the st_join function.
Let’s try getting the number of KSI collisions in 2022 within each Neighbourhood.
First, we filter out all KSI collisions for year 2022:
# Filter for 2022 data
ksi_2022 <- ksi %>%
filter(year(DATE) == 2022)
# Preview the data
ksi_2022 %>% as_tibbleThen, we must join each KSI point to each neighbourhood polygon:
# Join KSI data to be within each neighbourhood polygon
nbhoods_join <- nbhoods %>%
st_join(ksi_2022)
# Preview the joined data
nbhoods_join %>% as_tibbleFinally, we can apply the count function to get the
number of collisions per neighbourhood in column n:
# Count the joined ksis per nbhood
nbhoods_ksi_2022 <- nbhoods_join %>%
count(AREA_ID, AREA_NAME)
# Preview ksi per nbhood in 2022
nbhoods_ksi_2022 %>% as_tibbleWe can also plot these spatially:

The plot function is convenient for previewing spatial
data quickly, but makes it difficult to add standard map elements such
as base maps, north arrows and scale bars.
For publications, we can use the ggspatial package to
create publication ready maps.
As an example, we will utilize our joined 2022 KSI collision counts for each neighbourhood in Toronto, and create a publication ready map.
We will also use the Automated Speed Enforcement (ASE) data for Toronto to map an extra layer on top of the neighbourhood collision data.
Now that we have all the data we need, let’s create a publication ready map:
# Create pub ready map
ksi_map <- ggplot() +
annotation_map_tile(
zoomin = 1, # affects resolution of base map tiles
type = "cartolight", # type of base map
cachedir = "../data/cache" # cache the map tiles to avoid redownload
) +
annotation_north_arrow(
width = unit(0.2, "cm"),
height = unit(0.5, "cm"),
location = "tr" # where to place the arrow
) +
annotation_scale(
style = "ticks",
location = "br" # where to place the scale bar
) +
labs(
title = "2022 Neighbourhood Collisions (Toronto, ON, CA)", # map title
caption = "Data Source(s): Toronto Police Service Public Safety Portal, City of Toronto Open Data\nCRS: WGS84 / *Automated Speed Enforcement (ASE), Killed or Seriously Injured (KSI)" # map note
) +
layer_spatial( # add spatial data layer for collisions
nbhoods_ksi_2022,
aes(fill = n) # set the fill color to n, which is num of collisions
) +
layer_spatial( # add spatial data layer for ASE
ase,
fill = NA,
aes(color = "ASE* Camera") # set name of points for ASE
) +
scale_color_manual(
values = c("ASE* Camera" = "salmon") # change ase point color
) +
guides(
fill = guide_legend(
order = 1,
title = "KSI* Collisions" # legend title for collisions
),
color = guide_legend(
order = 2,
title = element_blank() # remove legend title for ase
)
) +
theme(
plot.caption = element_text(size = 6, color = "darkgray"), # alter note text
legend.key = element_blank(), # remove grey legend background in ase points
plot.title = element_text(hjust = 0.5) # center map title
)
# Show pub ready map
ksi_map
We can save the map we made in the previous section into a file with ggsave - ready to be included in a publication:
Now that we are done processing our data, we should save the results to a file with the write_sf function.
Let’s save our joined 2022 KSI collision counts for each
neighbourhood in the data folder:
# Save a geojson file
nbhoods_ksi_2022 %>%
write_sf("../data/toronto-nbhoods-ksi-2022.geojson")
# Save a shapefile
nbhoods_ksi_2022 %>%
write_sf("../data/toronto-nbhoods-ksi-2022.shp")Later on, we can read the saved shapefile back in:
# Read the shapefile back in
nbhoods_ksi_2022 <- read_sf("../data/toronto-nbhoods-ksi-2022.shp")
# Preview the data
nbhoods_ksi_2022 %>% as_tibbleWe should also save a our non-spatial ASE data that we converted into a spatial format:
That concludes the tutorial! For further information and other tutorials, see:
The visualization and mapping of spatial data is relatively complicated to get right, thus other alternatives to designing and creating maps more interactively and manually are:
R and R package versions:
## R version 4.3.1 (2023-06-16)
## Platform: x86_64-apple-darwin20 (64-bit)
## Running under: macOS Sonoma 14.4.1
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/Toronto
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.0 dplyr_1.1.2
## [5] purrr_1.0.1 readr_2.1.4 tidyr_1.3.0 tibble_3.2.1
## [9] ggplot2_3.5.1 tidyverse_2.0.0 tmap_3.3-4 prettymapr_0.2.5
## [13] sf_1.0-16 ggspatial_1.1.9
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.0 viridisLite_0.4.2 farver_2.1.1
## [4] fastmap_1.1.1 leaflet_2.2.2 XML_3.99-0.16.1
## [7] digest_0.6.33 timechange_0.2.0 lifecycle_1.0.4
## [10] ellipsis_0.3.2 terra_1.7-71 magrittr_2.0.3
## [13] compiler_4.3.1 rlang_1.1.2 sass_0.4.7
## [16] tools_4.3.1 utf8_1.2.3 yaml_2.3.7
## [19] knitr_1.43 labeling_0.4.2 htmlwidgets_1.6.2
## [22] bit_4.0.5 sp_2.1-3 classInt_0.4-10
## [25] plyr_1.8.8 RColorBrewer_1.1-3 abind_1.4-5
## [28] KernSmooth_2.23-21 withr_2.5.0 leafsync_0.1.0
## [31] grid_4.3.1 fansi_1.0.4 e1071_1.7-13
## [34] leafem_0.2.3 colorspace_2.1-0 scales_1.3.0
## [37] dichromat_2.0-0.1 cli_3.6.1 rmarkdown_2.23
## [40] crayon_1.5.2 ragg_1.2.5 generics_0.1.3
## [43] rstudioapi_0.15.0 tzdb_0.4.0 tmaptools_3.1-1
## [46] DBI_1.1.3 cachem_1.0.8 proxy_0.4-27
## [49] stars_0.6-5 parallel_4.3.1 rosm_0.3.0
## [52] s2_1.1.6 base64enc_0.1-3 vctrs_0.6.4
## [55] jsonlite_1.8.7 hms_1.1.3 bit64_4.0.5
## [58] systemfonts_1.0.4 crosstalk_1.2.1 jquerylib_0.1.4
## [61] units_0.8-5 glue_1.6.2 lwgeom_0.2-14
## [64] leaflet.providers_2.0.0 codetools_0.2-19 stringi_1.7.12
## [67] gtable_0.3.3 raster_3.6-26 munsell_0.5.0
## [70] pillar_1.9.0 htmltools_0.5.5 R6_2.5.1
## [73] textshaping_0.3.6 wk_0.9.1 vroom_1.6.3
## [76] evaluate_0.21 lattice_0.21-8 highr_0.10
## [79] png_0.1-8 bslib_0.5.0 class_7.3-22
## [82] Rcpp_1.0.11 xfun_0.39 pkgconfig_2.0.3
RStudio version:
## $citation
## To cite RStudio in publications use:
##
## Posit team (2023). RStudio: Integrated Development Environment for R.
## Posit Software, PBC, Boston, MA. URL http://www.posit.co/.
##
## A BibTeX entry for LaTeX users is
##
## @Manual{,
## title = {RStudio: Integrated Development Environment for R},
## author = {{Posit team}},
## organization = {Posit Software, PBC},
## address = {Boston, MA},
## year = {2023},
## url = {http://www.posit.co/},
## }
##
## $mode
## [1] "desktop"
##
## $version
## [1] '2023.6.1.524'
##
## $long_version
## [1] "2023.06.1+524"
##
## $release_name
## [1] "Mountain Hydrangea"